home *** CD-ROM | disk | FTP | other *** search
- Doug here,
-
- > I just finished some new PMMU software for the Afterburner - it uses some
- > nifty features of the 68040's new cache to improve speed and reduce reliance
- > on the Falcon BUS & external FastRAM.
-
- > Didn't they support copy-back mode before or what?
-
- No, they didn't - but I'm not all that sure I blame them.
-
- Copyback mode is not exactly Falcon-friendly, as they don't seem to have
- connected up the bus-snooping lines to monitor the DMA bus request signals, and
- this leads to lack of coherency between cache and memory. The trick is knowing
- where and when it's safe to map a page as copyback instead of write-through.
- Getting this even slightly wrong results in bad crashing even during the boot
- sequence.
-
- > Remember the GEMBench results for FPU calculation? It used to be about 133%
- > which I always said looked 'funny' to say the least - well below spec.
-
- > I actually wrote a longish article about GEMBench vs AB040 after I read
- > some of your earlier comments in NeST. I've not yet posted that, but in
- > essence it says that GEMBench's processor tests are just about completely
- > worthless.
-
- Yes, I have looked at the object code, and to say that it's even just a
- real-world test as opposed to a performance benchmark would be doing it
- a big favour. Took me a good 2 minutes to actually find the FPU instructions
- in there - and it's not supposed to do anything else... :)
-
- The only thing it's really good for is testing something against itself, and
- very approximately against it's predecessors. No sense of scale involved of
- course, but it shows up when things don't work. In this case, the FPU didn't
- work (in a manner of speaking).
-
- > For floating point GEMBench uses an autodetecting FP library, which means
- > that everything is done through functions, and the FPU registers are only
- > used internally in those. For simple things, like FADD etc, most of the
- > time is spent accessing the stack and such...
-
- Yes, I know - and the M68040FPSP library makes intensive use of the supervisor
- stack, which is why I mapped it copyback in the first place. I was not happy
- with the FPU results. Basically because it indicated a lower performance than
- a 68882 at the same clock rate, which I knew to be wrong from the scientific
- journals I aquired when the chip was announced some years ago.
-
- > (It's likely that the stack accesses are what your new PMMU setup improves.)
-
- That's exactly what's happening. The FPU itself makes little use of the data cache
- other than the odd memory reference when converting / storing floats. I didn't
- expect the datacache to improve the arithmetic capabilities of the chip, but rather
- reduce it's reliance on external RAM.
-
- > The memory test is not quite that bad, but comparing the figures between
- > an ST and a Falcon doesn't really say anything at all.
-
- I know - but it does indicate the difference in execution speed between Falcon030
- & 68040 when running GEMBench - which is all most people can relate to. I have a
- complete cycle sheet here and I KNOW the behaviour of most of the instruction types
- when in the pipeline, but it's hard to articulate this in a way that has any depth.
-
- It can perform a double-precision floating point multiply in 4 cycles, compared with
- the 68882's large double figures / small triple-figures - but it's not easy for
- everyone to visualise all the complexities of waitstates, scoreboarding etc.
-
- > With the new software, the FPU calculation is 360%!
-
- > The FPU in the '040 is actually _ten_ times faster than an equal clocked
- > '882 on (most of) the things it can do. For the non-simple things, which
- > have to be emulated by software, it's not much faster than the '882.
- > Naturally, a large part of GEMBench's FP testing uses those...
-
- Of course - and concurrency is lost during such operations, especially when
- they reference RAM.
-
- Interestingly, POV-Raytrace runs about 3 times faster with the super-stack
- in copyback mode than it does without - I wonder how it would perform with the
- more intensively used user-stack? It's a bit hard to keep track of the user stack
- though, and more than a little unsafe too. Still might be worth patching... :)
-
- > I'll be releasing the new software when it's ready...
-
- > I _really_ must find the money for an AfterBurner soon!
-
- I was disappointed at first, as I had some problems with crashing and so on,
- but after getting my 68040 user manual I was able to actually fix things and
- experiment a bit. Worthwhile in the end - I would have been completely lost
- without that book.
-
- I just made a mod to the software to copy ROM into FastRAM and shadow it with the
- MMU, but It's hard to tell exactly what difference it's made to the performance
- of the machine in general - GEMBench reports nearly 1000% on the ROM, but we all
- know what that means.. :)
-
- Doug.
-
-